Multi-System Fusion of Extended Context Prosodic and Cepstral Features for Paralinguistic Speaker Trait Classification
نویسندگان
چکیده
As automatic speech processing has matured, research attention has expanded to paralinguistic speech problems that aim to detect beyond-the-words information. This paper focuses on the identification of seven speaker trait categories from the Interspeech Speaker Trait Challenge: likeability, intelligibility, openness, conscientiousness, extraversion, agreeableness, and neuroticism. Our approach combines multiple features including prosodic, cepstral, shifted-delta cepstral, and a reduced set of the OpenSMILE features. Our classification approaches included GMM-UBM, eigenchannel, support vector machines, and distance based classifiers. Optimized feature reduction and logistic regression-based score calibration and fusion led to results that perform competitively against the challenge baseline in all categories.
منابع مشابه
Recognition of Paralinguistic Information using Prosodic Features Related to Intonation and Voice Quality
Besides the linguistic (verbal) information conveyed by speech, the paralinguistic (nonverbal) information, such as intenning the classification of paralinguistic information. Among the several paralinguistic items extions, attitudes and emotions expressed by the speaker, also convey important meanings in communication. Therefore, to realize a smooth communication between humans and spoken dial...
متن کاملComparing prosodic models for speaker recognition
Recently, speaker verification systems using different kinds of prosodic features have been proposed. Although it has been shown that most of these speaker verification systems can improve system performance using score-level fusion with stateof-the-art cepstral-based systems, a systematic comparison of the prosodic modelling algorithms used in these prosodic systems has not yet been performed....
متن کاملCombining five acoustic level modeling methods for automatic speaker age and gender recognition
This paper presents a novel automatic speaker age and gender identification approach which combines five different methods at the acoustic level to improve the baseline performance. The five subsystems are (1) Gaussian mixture model (GMM) system based on mel-frequency cepstral coefficient (MFCC) features, (2) Support vector machine (SVM) based on GMM mean supervectors, (3) SVM based on GMM maxi...
متن کاملMultimodal Fusion of Multirate Acoustic, Prosodic, and Lexical Speaker Characteristics for Native Language Identification
Native language identification from acoustic signals of L2 speakers can be useful in a range of applications such as informing automatic speech recognition (ASR), speaker recognition, and speech biometrics. In this paper we follow a multistream and multi-rate approach, for native language identification, in feature extraction, classification, and fusion. On the feature front we employ acoustic ...
متن کاملA Study of the Relationship between Acoustic Features of “bæle” and the Paralinguistic Information
Language users benefit from special phonetic tools in order to communicate linguistic information as well as different emotional aspects and paralinguistic information through daily conversation. Having functions in conveying semantic information to listeners, prosodic features form the essential part of linguistic behavour, manipulating them potentially can play an important role in transmitt...
متن کامل